Author: Daniel A. Williams III
Subj: Probability distribution program for 48G/GX
Date: March 15, 1996

SUMMARY:
This is a suite of utilities written  on an HP48GX
for eight of the most commonly used
probability distributions in statistics and probability.
Specifically, this program computes the M.G.F., 
the complement of the C.D.F. (not precisely true for discrete),
the inverse of the complement of the C.D.F.,
the mean, and the variance of the binomial, Gaussian, exponential,
Poisson, gamma, chi-squared, T, and F distributions.

CREDIT:
I wish to give credit to LUCA RADICE for the original version of the program.
Although these utilities are an almost total rewrite of the utilities
written by Luca Radice of Politecnico di Milano in '91, 
I would probably not have attempted to write this, had he not
written and uploaded an editable version of his PROBDIST package.

I also wish to thank and acknowledge the support of SUMMA of the
Mathematical Association of America, without whose CCRCA
Summer Institute Program this work would not have come to past.

MOTIVATION:
What prompted me to download Radice' program in the first place was
my intent to use it in a Probability and Statistics class which I
am teaching as a Caculator Aided Instruction course. All of the students
have HP48s to use. Although the HP48 has routines built-in to
compute upper tail probabilties for the most useful continouse distributions
in statistics, to render distribution tables completely obsolete
(as HP did with trig and log tables 25 years ago), the inverse
of these distribution functions are needed. Also, distributions
for some of the some of the more common dicrete distributions are
conspicuously missing.

Unfortunately, Radice's probdist package provides
inverse functions for only two of the eight distributions he worked on.
Also, some of the formulas for his density functions are wrong,
I believe because he sometimes overlooked the fact that the
HP48's factorial function is not the gamma function, but a
a unit translation of the gamma function. In addition, some of his
functions are unnessarily slow, because he does not capitalize on the
48's features (for example, he used the bisection algorithm to invert,
rather than using the HP48G's ROOT program) or short algorithms.
Finally, it was not  "user friendly" enough to pass on
to the class, as it requires that you have his source code at hand
to know what-does-what. (for example, I clobbered one of his programs
by storing a value in it, because I mistook it for a parameter.)

For all of the above reasons (and also because I wanted to gain proficiency
at programming on the HP48G), I decided to rewrite his program.

INSTRUCTIONS:
The STAT.DIR file is an HP48 directory in binary format (STAT.ASC is the 
same thing in ascii), which contains eight subdirectories,
one for each distribution.  Each directory contains six to eight global
variables (i.e. menu or function key items)
which contain programs, followed by one or two variables which are 
the stored parameters for the distribution. The program names and functions
are fairly consistent from directory to directory:

1. The first menu item names the density function.
It requires no arguments. Hitting the function key
causes the density function to be displayed on the screen in "pretty form."
Hit cancel to escape from it.

2. The 2nd menu item requires no arguments on the stack prior to running.
It is called PARAM (for parameters). Hitting the function key will prompt the
user to key in the appropriate parameters. To be sure you are interpreting
the parameter correctly, you may need to look at the density function
(menu item 1). I conformed to the notation (i.e. parameters) in Ross' book
"A First Course in Probability" for most of the distributions, but
for the T and F, I used Hogg and Craig's book
"Introduction to Mathematical Statistics."
As  output, the program stores the user-input parameter values
into global variables in the current directory. These values are used by
all of the other programs in the directory. In addition,
when they exist, the value of the
mean and variance are printed on the stack but not stored.
(d.n.e. means "does not exist")

3. The 3rd menu item is the moment generating function, MGF(t),
when a formula for it exists. This function requires one (numerical)
argument on the stack. Anything you can do with a built-in
function (such as sin or log) you should be able to do with MGF.
Thus for example, to compute the first moment,
differentiate MGF and evaluate at t=0,
using the HP48's differentiation and "where" operators or the symbolic menu.
The output is returned to the stack.

4. The next menu item (only for the discrete distributions) is X->\GDP.
It requires one argument on the stack,
the value "a" for which you wish to compute Prob{X=a}.
The output (the probability mass function value P) is returned to the stack.

5. The next menu item is the upper tail distribution function, X->P.
It requires one argument on the stack,
the value "a" for which you wish to compute Prob{X>=a}.
The output (the probability P) is returned to the stack.

6. The next menu item is the only multivariable function, AB->P.
It requires two arguments on the stack,
the values of "a" and "b" for which you wish to compute Prob{a<=X<=b}.
The output (the probability P) is returned to the stack.

7. The last menu item which "does something" is the inverse function, P->X.
It requires one argument on the stack, the value of the
upper tail probability for which you wish to find the value of "a"
such that Prob{X>=a}=p.
For continuous distributions, the output (the value of "a")
is returned to the stack. For discrete distributions, the values
of X which give probabilities which bracket p are output to the stack.


MISC:
As with Luca Radice's program, there is plenty of room for improvement:
*The package can be made much more compact by incorporating the repetitive
parts of the programs into subprograms and storing the subprograms one
directory level above the distribution directories.
*There are bound to be bugs that I have not found. 
*Someone who is more adept than I at programming in RPN could probably
make it faster and more user friendly while occupying less real estate.
*It could be transformed into a library, to prevent inadvertent 
corruption of the program by users, and the library could be written
to hide the not-for-the-user global variables behind a user menu. 
*There could be more error checking.
*More distributions could be added.

On the other hand, as the program stands, you can save space 
with no effort by eliminating any directory you do not have
use for without worrying about it affecting the other distribution
directories. In fact, within any given directory MOST of the menu items
are independent of each other, so you can delete what you don't want.
(exceptions : X->P is sometimes used by AB->P and P->X in the continuous
distributions; X->\GDP is used by X->P, AB->P, and P->X in the discrete ones.)
Also, the programs can be edited to suit you taste; you may for example want
to turn X->P into the CDF, since its complement is already built into
the calculator for most of the distributions implemented.
The biggest memory hog is the gamma directory.

Tips For Tinkerers:
Most of the programs are straightforward, insofar as there are no
sophisticated numerical algorithms or probability used (other than of
course what is employed in the HP48s built-in functions.)
However:
*For the inverters, I find upper and lower bounds for the solution
using Markov's, Chernoff's, or Chebychev's inequalities.
I then hand off to ROOT the bounds and the midpoint of the bounds. 
I don't know what ROOT does when you feed it more than one guess,
but since the CDF is monotonic, its probably can't mess it up.
*the slowest and biggest memory hogs are in the gamma directory: 
---The CDF function checks to see if it can use the chi-squared CDF
first; if not, but \alpha > 0.5, then it interpolates between two chi-squared
CDFs to get 2 digit accurate approx., and finally if none of these conditions
are met then it integrates the improper integral to 4 digits of accuracy
by estimating the improper part of the interval of integration.
If you are willing to wait 5 minutes for a result, you can get 
10 digits of accuracy using ab->p (which simply integrates the PDF)
instead of x->p.
---The inverter switches to 
one decimal digit of accuracy and uses the bisection algorithm to
get a one decimal digit of accuracy "guess" to the solution,
then switches back to maximal accuracy before passing on
the guess to ROOT. 


Daniel A. Williams III			Howard University
daw@scs.howard.edu			Washington, D.C.
